Distribution and cluster analysis of predicted intrinsically disordered protein Pfam domains

نویسندگان

  • Robert W Williams
  • Bin Xue
  • Vladimir N Uversky
  • A Keith Dunker
چکیده

The Pfam database groups regions of proteins by how well hidden Markov models (HMMs) can be trained to recognize similarities among them. Conservation pressure is probably in play here. The Pfam seed training set includes sequence and structure information, being drawn largely from the PDB. A long standing hypothesis among intrinsically disordered protein (IDP) investigators has held that conservation pressures are also at play in the evolution of different kinds of intrinsic disorder, but we find that predicted intrinsic disorder (PID) is not always conserved across Pfam domains. Here we analyze distributions and clusters of PID regions in 193024 members of the version 23.0 Pfam seed database. To include the maximum information available for proteins that remain unfolded in solution, we employ the 10 linearly independent Kidera factors1-3 for the amino acids, combined with PONDR4 predictions of disorder tendency, to transform the sequences of these Pfam members into an 11 column matrix where the number of rows is the length of each Pfam region. Cluster analyses of the set of all regions, including those that are folded, show 6 groupings of domains. Cluster analyses of domains with mean VSL2b scores greater than 0.5 (half predicted disorder or more) show at least 3 separated groups. It is hypothesized that grouping sets into shorter sequences with more uniform length will reveal more information about intrinsic disorder and lead to more finely structured and perhaps more accurate predictions. HMMs could be trained to include this information.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MobiDB 2.0: an improved database of intrinsically disordered and mobile proteins

MobiDB (http://mobidb.bio.unipd.it/) is a database of intrinsically disordered and mobile proteins. Intrinsically disordered regions are key for the function of numerous proteins. Here we provide a new version of MobiDB, a centralized source aimed at providing the most complete picture on different flavors of disorder in protein structures covering all UniProt sequences (currently over 80 milli...

متن کامل

Intrinsically Disordered Proteins Display No Preference for Chaperone Binding In Vivo

Intrinsically disordered/unstructured proteins (IDPs) are extremely sensitive to proteolysis in vitro, but show no enhanced degradation rates in vivo. Their existence and functioning may be explained if IDPs are preferentially associated with chaperones in the cell, which may offer protection against degradation by proteases. To test this inference, we took pairwise interaction data from high-t...

متن کامل

Functional Annotation of Intrinsically Disordered Domains by Their Amino Acid Content Using IDD Navigator

Function prediction of intrinsically disordered domains (IDDs) using sequence similarity methods is limited by their high mutability and prevalence of low complexity regions. We describe a novel method for identifying similar IDDs by a similarity metric based on amino acid composition and identify significantly overrepresented Gene Ontology (GO) and Pfam domain annotations within highly similar...

متن کامل

Serine/arginine-rich splicing factors belong to a class of intrinsically disordered proteins

Serine/arginine-rich (SR) splicing factors play an important role in constitutive and alternative splicing as well as during several steps of RNA metabolism. Despite the wealth of functional information about SR proteins accumulated to-date, structural knowledge about the members of this family is very limited. To gain a better insight into structure-function relationships of SR proteins, we pe...

متن کامل

HIDDEN MARKOV MODELS AND LARGE - SCALE GENOMEANALYSISSean

PFAM is a database of multiple alignments and hidden Markov models (HMMs) of common, conserved protein domains. PFAM HMMs complement BLAST analysis in the annotation of the C. elegans and human genome sequencing projects at Washington University and the Sanger Centre. PFAM2, based on full, gapped multiple alignments of structural and/or functional protein domains, currently contains 527 models....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2013